Skip to content

Conversation

YuriZmytrakov
Copy link
Collaborator

@YuriZmytrakov YuriZmytrakov commented Sep 29, 2025

Description:

This PR implements returned object limit constraints for collections and items search endpoints which are set in environment variables. The changes ensure consistent behavior between GET and POST search methods.

PR Checklist:

  • Code is formatted and linted (run pre-commit run --all-files)
  • Tests pass (run make test)
  • Documentation has been updated to reflect changes, if applicable
  • Changes are added to the changelog

@YuriZmytrakov YuriZmytrakov marked this pull request as draft September 29, 2025 16:44
@YuriZmytrakov YuriZmytrakov force-pushed the CAT-1490-2 branch 6 times, most recently from 124170f to 48e3b2e Compare September 29, 2025 19:27
@YuriZmytrakov YuriZmytrakov changed the title first commit Default and max allowed returned values for collections and items Sep 29, 2025
@YuriZmytrakov YuriZmytrakov force-pushed the CAT-1490-2 branch 6 times, most recently from 2074b15 to d6137b8 Compare September 30, 2025 10:39
@YuriZmytrakov YuriZmytrakov marked this pull request as ready for review September 30, 2025 21:59
Copy link
Collaborator

@jonhealy1 jonhealy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, just a few things.

README.md Outdated
| `STAC_GLOBAL_ITEM_MAX_LIMIT` | Configures the maximum number of STAC items that can be returned in a single search request. | `100` | Optional |
| `STAC_DEFAULT_ITEM_LIMIT` | Configures the default number of STAC items returned when no limit parameter is specified in the request. | `10` | Optional |
| `STAC_INDEX_ASSETS` | Controls if Assets are indexed when added to Elasticsearch/Opensearch. This allows asset fields to be included in search queries. | `false` | Optional |
| `ENV_MAX_LIMIT` | Configures the environment variable in SFEOS to override the default `MAX_LIMIT`, which controls the limit parameter for returned items and STAC collections. | `10,000` | Optional |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YuriZmytrakov Do we want to keep ENV_MAX_LIMIT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like ENV_MAX_LIMIT is used by max_result_window, I will need a follow up task to remove the old limit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it interferes with the logic in this pr?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was your pr: #434

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s correct. Since the ENV_MAX_LIMIT env var is no longer used for controlling amount of returned items/collections, I’ll need to revert my changes from the previous PR. I have also removed it from the README and updated the changelog. Apologies for confusion on this topic.

limit = default_limit

if global_max_limit > 0:
limit = min(limit, global_max_limit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we do this - limit = min(limit, global_max_limit) - are we ignoring the fact that someone may want a higher global limit? We should write down what the expectations are and then add a test I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe test these cases - let me know what you think:

Test with no global max limit set (should allow any limit).
Test with global max limit set (should cap at that limit).
Test with no limit specified (should use default).
Test with limit higher than global max (should be capped).
Test with limit lower than global max (should be respected).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was agreed that we should set a global limit for /collections and /items, as allowing too many results slows down the API. The value can always be overridden by an env var, also I am not sure if we should allow unlimited returned items ever.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonhealy1 I have added the tests, removed not relevant test:
Test with no limit specified for items/collections (should use default). -test_default_item_limit_without_limit_parameter_set(), test_default_collection_limit()

Test with limit higher than global max items/collections (should be capped). - test_global_collection_max_limit_set(), test_default_collection_limit()

Test with limit lower than global max (should be respected). - not sure if we need this test.

Test with no global max limit set (should allow any limit). - not relevant, unless agree to allow any limit. Also, overpopulating the test collections/items slows down the test.

Test with global max limit set for items/collection (should cap at that limit). - test_global_collection_max_limit_set(), test_global_item_max_limit_set()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can the value be overridden by an env var? Maybe I am missing something here?

Copy link
Collaborator

@jonhealy1 jonhealy1 Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Test with no global max limit set (should allow any limit). - not relevant, unless agree to allow any limit. Also, overpopulating the test collections/items slows down the test"

@YuriZmytrakov No one envisions having an api returning unlimited items. You set the default max limit for Items to 100. If a User wants to return 110 items, they are unable to (unless I am missing something, please explain). There is no reason to place these restrictions. People who use this project expect flexibility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test this by showing how the limit param filters down to execute_search - test what its final value is - without ingesting over 100 Items?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with the team — yes, we should not restrict the number of items, collections that can be returned by default. This was fixed by removing the default values items=100, collections=300. For this reason, I’m adding proper tests to verify that when no global max limit is set (any limit is allowed) populated 20 items, collections and querying to ensure any limit is allowed.

limit = min(limit, global_limit)
else:
limit = global_limit
global_max_limit = int(os.getenv("STAC_GLOBAL_COLLECTION_MAX_LIMIT", 300))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think global_max_limit shouldn't have a default. If it's not set then there is no max limit.

Copy link
Collaborator Author

@YuriZmytrakov YuriZmytrakov Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default max value for collections has been removed.

limit = default_limit

if global_max_limit > 0:
limit = min(limit, global_max_limit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_search passes limit to post_search, so we should only have to do this in post_search

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like limit for post_search was originally populated within the get_search func, removing populating global and default limits, stops this function to work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please pass the limit as is from get_search - this logic only needs to be in post_search. Make the change, if it's not working, I can help fix it. This is from get_search - you may need to remove the first line?:

limit = int(request.query_params.get("limit", os.getenv("STAC_ITEM_LIMIT", 10)))

base_args = {
"collections": collections,
"ids": ids,
"bbox": bbox,
"limit": limit,
"token": token,
"query": orjson.loads(query) if query else query,
"q": q,
}

Copy link
Collaborator Author

@YuriZmytrakov YuriZmytrakov Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is correct, there is no need to duplicate the same logic in get_search and post_search. I have fixed it.

Raises:
HTTPException: If there is an error with the cql2_json filter.
"""
global_max_limit = int(os.getenv("STAC_GLOBAL_ITEM_MAX_LIMIT", 100))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concerned here too about setting a default. It may work for some Users but maybe not others.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same, it was agreed that the default number of items should be limited to avoid slowing down the API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set this parameter in your api instance, but other people who set up their own api, using this project, may choose to return more than 100 items. Who agreed to this? When you set up your api on your servers, you set this env var, and then everything is fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default param of 100 has been removed, thank you!

@YuriZmytrakov YuriZmytrakov force-pushed the CAT-1490-2 branch 2 times, most recently from d89c9e2 to ab40720 Compare October 8, 2025 19:06
Raises:
HTTPException: If there is an error with the cql2_json filter.
"""
global_max_limit = int(os.getenv("STAC_GLOBAL_ITEM_MAX_LIMIT", 100))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set this parameter in your api instance, but other people who set up their own api, using this project, may choose to return more than 100 items. Who agreed to this? When you set up your api on your servers, you set this env var, and then everything is fine.

limit = default_limit

if global_max_limit > 0:
limit = min(limit, global_max_limit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can the value be overridden by an env var? Maybe I am missing something here?

limit = default_limit

if global_max_limit > 0:
limit = min(limit, global_max_limit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please pass the limit as is from get_search - this logic only needs to be in post_search. Make the change, if it's not working, I can help fix it. This is from get_search - you may need to remove the first line?:

limit = int(request.query_params.get("limit", os.getenv("STAC_ITEM_LIMIT", 10)))

base_args = {
"collections": collections,
"ids": ids,
"bbox": bbox,
"limit": limit,
"token": token,
"query": orjson.loads(query) if query else query,
"q": q,
}

README.md Outdated
| `STAC_GLOBAL_ITEM_MAX_LIMIT` | Configures the maximum number of STAC items that can be returned in a single search request. | `100` | Optional |
| `STAC_DEFAULT_ITEM_LIMIT` | Configures the default number of STAC items returned when no limit parameter is specified in the request. | `10` | Optional |
| `STAC_INDEX_ASSETS` | Controls if Assets are indexed when added to Elasticsearch/Opensearch. This allows asset fields to be included in search queries. | `false` | Optional |
| `ENV_MAX_LIMIT` | Configures the environment variable in SFEOS to override the default `MAX_LIMIT`, which controls the limit parameter for returned items and STAC collections. | `10,000` | Optional |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it interferes with the logic in this pr?

@jonhealy1
Copy link
Collaborator

jonhealy1 commented Oct 9, 2025

@YuriZmytrakov If this pr is merged there will be 5 environment variables dealing with limit. I think it's too confusing as is. The expectations will need to be documented in the Readme. You may need to allot more time to this issue.

  1. no limit set in request, defaults to default limit env var (can have a default value)
  2. limit set in request, this value has precedence, is limited by max limit env var (optional) (if it is set) (no default value)

ENV_MAX_LIMIT should be removed unless you can justify keeping it.

  • if company ABC wants to set up this api on their servers and allow their customers to do something like search?limit=100000 they can - it may not makes sense - they may want to set the max limit env vars - we can recommend that they do in the Readme

@jonhealy1
Copy link
Collaborator

I worked on a project where I was fetching hundred of items at a time from mongodb. I divided up the screen into 16 grids and made 16 asynchronous calls to the database to fetch data with a very high limit set. By using the fields extension I was limiting the amount of data sent which was then adjusted at different zoom levels. Zoomed out, I was only fetching geospatial data to display points on a map. But as Users zoomed in, more data was returned. At high zoom levels, Users could click on a point and get the full data for that point.

@YuriZmytrakov YuriZmytrakov force-pushed the CAT-1490-2 branch 6 times, most recently from 484c63b to 4aa8b7f Compare October 10, 2025 08:18
@YuriZmytrakov YuriZmytrakov force-pushed the CAT-1490-2 branch 4 times, most recently from 700a222 to 0402484 Compare October 10, 2025 10:51
Yuri Zmytrakov and others added 4 commits October 10, 2025 13:28
…items

Ensure collections and items endpoints use their respective default and maximum limit values.
**Related Issue(s):**

- None

**Description:**

Fixed

- Issue where token, query param was not being passed to POST
collections search logic.
- Issue where datetime param was not being passed from POST collections
search logic to Elasticsearch.
- Collections search tests to ensure both GET /collections and GET/POST
/collections-search endpoints are tested.

**PR Checklist:**

- [x] Code is formatted and linted (run `pre-commit run --all-files`)
- [x] Tests pass (run `make test`)
- [x] Documentation has been updated to reflect changes, if applicable
- [x] Changes are added to the changelog
**Related Issue(s):**

- None

**Description:**

- Add `Latest News` section to readme
- Add `CloudFerro` logo to the Sponsors and Supporters section

**PR Checklist:**

- [x] Code is formatted and linted (run `pre-commit run --all-files`)
- [x] Tests pass (run `make test`)
- [x] Documentation has been updated to reflect changes, if applicable
- [x] Changes are added to the changelog
**Related Issue(s):**

- None

**Description:**

- Make latest news section scrollable, adjust logo size


**PR Checklist:**

- [x] Code is formatted and linted (run `pre-commit run --all-files`)
- [x] Tests pass (run `make test`)
- [x] Documentation has been updated to reflect changes, if applicable
- [x] Changes are added to the changelog
Copy link
Collaborator

@jonhealy1 jonhealy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes. Just one more thing..

body_data = await request.json()
body_limit = body_data.get("limit")
except Exception:
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using request.query_params.get("limit") or body_data.get("limit"), can you just use limit from the function params?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the limit can be taken from the func params. However, since the default limit=10, I can’t determine whether the user explicitly provided the ?limit param or it's default limit=10. The ?limit parameter should override the default_limit from the env var.
Option 1: Use a boolean to determine whether the user the ?limit or limit param body, and if so, use that value as the limit in the function.
Option 2: Keep the proposed approach and simply pull the limit from request.query_params.get("limit") or body_data.get("limit").

Let me know your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants